IntroductionΒΆ

Data is ubiquitous, and leveraging historical data for forecasting remains valuable as history often repeats itself. Knowing the title of a book is beneficial, but reading it is far more useful; similarly, data is essential in our analyses.

In this project, I have utilized well-known US stock tickers, including Apple (AAPL), Adobe (ADBE), Amazon (AMZN), Tesla (TSLA), Nvidia (NVDA), and Microsoft (MSFT). The data was sourced from yFinance library in pyhton, a public and easily accessible platform that hosts realistic datasets. And the Financial Statments From Stock Analysis.

This project is structured around Four key questions, each addressed in dedicated sections, and The Questions are :

1.) What was the change in price and volume of each stock over time and their correlation over 2.51 years?

2.) What was the moving averages of the various stocks and the correlation between RSI and the Close Price over 2.51 years?

3.) What is the financial health of each company, and what are the primary metrics to focus on?

4.) How can we predict future stock behavior for each Stock using Predicited Models?

The structure of the report is as follows:

Pre_Section: Loading the Data

Section 1: Q1 Answer

Section 2: Q2 Answer

Section 3: Q3 Answer

Section 4: Q4 Answer

Conclusion

Pre_Section: Loading the TickersΒΆ

InΒ [Β ]:
# Install required packages

! pip install pandas seaborn plotly matplotlib numpy scipy scikit-learn keras tensorflow arch
InΒ [17]:
# Imports

import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
from scipy import stats
from matplotlib.colors import LinearSegmentedColormap
from matplotlib.patheffects import withStroke
from typing import Tuple
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Input
from scipy.stats import skewnorm, skew, kurtosis
from arch import arch_model
from sklearn.model_selection import TimeSeriesSplit
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l1_l2
from tensorflow.keras.metrics import RootMeanSquaredError, MeanAbsolutePercentageError
import tensorflow as tf 
from tensorflow.keras.models import Model
from tensorflow.keras.losses import Huber
InΒ [5]:
# Load the data from the CSV file using a relative path
AAPL = pd.read_csv("../P2_Stocks/AAPL/AAPL.csv")

# Display the first few rows of the data
AAPL.head()
Out[5]:
Date Adj Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily_Return Volume MACD Signal_Line
0 1/4/2016 23.91 26.08 25.58 25.72 25.99 30.42 0.001 270597600 -0.41 -0.410000
1 1/5/2016 23.32 26.03 25.49 25.69 25.94 29.37 -0.025 223164000 -0.45 -0.418000
2 1/6/2016 22.86 25.95 25.39 25.66 25.88 23.43 -0.020 273829600 -0.49 -0.432400
3 1/7/2016 21.89 25.86 25.25 25.62 25.80 21.41 -0.042 324377600 -0.55 -0.455920
4 1/8/2016 22.01 25.79 25.12 25.58 25.72 26.71 0.005 283192000 -0.60 -0.484736
InΒ [67]:
ADBE = pd.read_csv("../P2_Stocks/ADBE/ADBE.csv")
# Display the data
ADBE.head()
Out[67]:
Date Adj Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily_Return Volume
0 1/4/2016 91.97 91.36 90.97 86.67 87.95 45.68 -0.021 2993800
1 1/5/2016 92.34 91.44 91.02 86.76 88.04 44.22 0.004 1821300
2 1/6/2016 91.02 91.51 91.02 86.82 88.10 34.15 -0.014 1674000
3 1/7/2016 89.11 91.51 90.95 86.85 88.12 32.89 -0.021 2717800
4 1/8/2016 87.85 91.51 90.83 86.87 88.11 36.93 -0.014 2263400
InΒ [68]:
TSLA = pd.read_csv("../P2_Stocks/TSLA/TSLA.csv")
# Display the data
TSLA.head()
Out[68]:
Date Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily_Return Volume
0 1/4/2016 14.89 14.91 15.25 15.51 15.44 54.38 -0.069 102406500
1 1/5/2016 14.90 14.93 15.24 15.50 15.43 52.22 0.000 47802000
2 1/6/2016 14.60 14.94 15.21 15.49 15.42 32.25 -0.020 56686500
3 1/7/2016 14.38 14.94 15.18 15.47 15.40 30.65 -0.015 53314500
4 1/8/2016 14.07 14.94 15.13 15.44 15.37 29.54 -0.022 54421500
InΒ [69]:
AMZN = pd.read_csv("../P2_Stocks/AMZN/AMZN.csv")
# Display the data
AMZN.head()
Out[69]:
Date Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily_Return Volume
0 1/4/2016 31.85 32.77 32.23 29.55 0.93 41.34 -0.058 186290000
1 1/5/2016 31.69 32.84 32.21 29.61 0.91 39.92 -0.005 116452000
2 1/6/2016 31.63 32.87 32.18 29.66 0.92 29.91 -0.002 106584000
3 1/7/2016 30.40 32.87 32.11 29.70 0.96 25.29 -0.039 141498000
4 1/8/2016 30.35 32.87 32.05 29.73 0.95 26.47 -0.001 110258000
InΒ [70]:
MSFT = pd.read_csv("../P2_Stocks/MSFT/MSFT.csv", low_memory=False)
# Display the data
MSFT.head()
Out[70]:
Date Adj Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily Returns Volume
0 1/4/2016 48.52 48.05 47.40 43.75 45.16 47.72 -0.012 53778000
1 1/5/2016 48.74 48.18 47.45 43.83 45.23 49.02 0.005 34079700
2 1/6/2016 47.86 48.21 47.47 43.90 45.29 36.53 -0.018 39518900
3 1/7/2016 46.19 48.18 47.42 43.95 45.30 30.75 -0.035 56564900
4 1/8/2016 46.33 48.16 47.38 44.00 45.32 38.40 0.003 48754000
InΒ [71]:
NVDA = pd.read_csv("../P2_Stocks/NVDA/NVDA.csv")
# Display the data
NVDA.head()
Out[71]:
Date Close Simple Moving Average_50 Exponential Moving Average_50 Simple Moving Average_100 Exponential Moving Average_100 Relative Strength Index 14 Daily_Return Volume
0 1/4/2016 0.809 0.78 0.78 0.69 0.71 47.80 -0.018 358076000
1 1/5/2016 0.822 0.78 0.78 0.69 0.72 49.14 0.016 490272000
2 1/6/2016 0.788 0.79 0.78 0.69 0.72 35.91 -0.041 449344000
3 1/7/2016 0.757 0.79 0.78 0.70 0.72 31.81 -0.040 645304000
4 1/8/2016 0.741 0.79 0.78 0.70 0.72 31.24 -0.021 398472000
InΒ [72]:
ATL= pd.read_excel("../python/financial_ratios.xlsx",sheet_name=0)
# Display the data
ATL.head()
Out[72]:
Quarter Ending AssetΒ toΒ LiabilityΒ Ratio AAPL AssetΒ toΒ LiabilityΒ Ratio ADBE AssetΒ toΒ LiabilityΒ Ratio AMZN AssetΒ toΒ LiabilityΒ Ratio MSFT AssetΒ toΒ LiabilityΒ Ratio NVDA AssetΒ toΒ LiabilityΒ Ratio TSLA
0 2019-06-29 1.427207 2.043926 1.383694 1.555459 3.328452 1.289257
1 2019-09-28 1.364830 2.029124 1.396294 1.613445 3.439948 1.295579
2 2019-12-28 1.356574 1.973579 1.380298 1.637629 3.387791 1.309554
3 2020-03-28 1.324104 2.014829 1.418501 1.669800 2.289906 1.404706
4 2020-06-27 1.294954 2.094571 1.399424 1.646445 2.235043 1.425394
InΒ [73]:
DTE= pd.read_excel("../python/financial_ratios.xlsx",sheet_name=1)
# Display the data
DTE.head()
Out[73]:
Quarter Ending D/E Ratio AAPL D/E Ratio ADBE D/E Ratio AMZN D/E Ratio MSFT D/E Ratio NVDA D/E Ratio TSLA
0 2019-06-29 1.12 0.40 1.10 0.77 0.24 2.28
1 2019-09-28 1.19 0.39 1.05 0.72 0.22 2.21
2 2019-12-28 1.21 0.45 1.02 0.70 0.21 2.03
3 2020-03-28 1.40 0.43 0.98 0.65 0.57 1.52
4 2020-06-27 1.56 0.40 1.03 0.60 0.54 1.43
InΒ [75]:
EBITM= pd.read_excel("../python/financial_ratios.xlsx",sheet_name=3)
# Display the data
EBITM.head()
Out[75]:
Quarter Ending EBIT Margin AAPL EBIT Margin ADBE EBIT Margin AMZN EBIT Margin MSFT EBIT Margin NVDA EBIT Margin TSLA
0 2019-06-29 0.2375 0.3088 0.0515 0.3934 0.2400 -0.0342
1 2019-09-28 0.2645 0.3319 0.0432 0.4031 0.3225 0.0562
2 2019-12-28 0.2908 0.3080 0.0516 0.3994 0.3314 0.0429
3 2020-03-28 0.2382 0.3286 0.0488 0.3843 0.3266 0.0312
4 2020-06-27 0.2318 0.3374 0.0746 0.3709 0.1715 0.0489
InΒ [78]:
PBR= pd.read_excel("../python/financial_ratios.xlsx",sheet_name=6)
# Display the data
PBR.head()
Out[78]:
Quarter Ending P/B Ratio AAPL P/B Ratio ADBE P/B Ratio AMZN P/B Ratio MSFT P/B Ratio NVDA P/B Ratio TSLA
0 2019-06-29 9.04 13.50 17.60 9.56 10.42 6.930233
1 2019-09-28 10.52 14.20 15.20 9.58 11.12 7.137778
2 2019-12-28 13.89 15.90 14.81 10.48 12.52 11.383673
3 2020-03-28 13.40 17.09 14.88 10.08 13.66 10.458084
4 2020-06-27 20.61 21.17 18.72 12.59 18.20 20.336158
InΒ [79]:
ROE= pd.read_excel("../python/financial_ratios.xlsx",sheet_name=8)
# Display the data
ROE.head()
Out[79]:
Quarter Ending ROE AAPL ROE ADBE ROE AMZN ROE MSFT ROE NVDA ROE TSLA
0 2019-06-29 0.10 0.08 0.05 0.13 0.05 -0.07
1 2019-09-28 0.15 0.08 0.04 0.10 0.08 0.02
2 2019-12-28 0.25 0.09 0.05 0.11 0.08 0.02
3 2020-03-28 0.14 0.10 0.04 0.09 0.07 0.00
4 2020-06-27 0.16 0.08 0.07 0.09 0.04 0.01

Section 1: Q1 AnswerΒΆ

1.) What was the change in price and volume of each stock over time and their correlation over 2.51 years?ΒΆ

VI. Definitions:ΒΆ
Close Price:ΒΆ

Definition: The close price is the final price at which a security is traded on a given trading day.

Usage: It represents the last transaction price before the market closes.

Significance: Investors and analysts use the close price to assess the daily performance of a security and make trading decisions.

Adjusted Close Price:ΒΆ

Definition: The adjusted close price is the close price adjusted for any corporate actions that occurred before the next trading day.

Adjustments Include:

Dividends: Cash dividends paid out to shareholders are accounted for. The stock price typically drops by the dividend amount on the ex-dividend date.

Stock Splits: If a company splits its stock, increasing the number of shares while reducing the price per share proportionally, the adjusted close price reflects this change.

Other Corporate Actions: Such as spin-offs.

Trading volume:ΒΆ

it refers to the total number of shares or contracts traded for a specific security during a given period. This metric can be measured for different time frames, such as a single trading day, a week, a month, or any other period. Trading volume is a crucial indicator in financial markets because it provides insight into the activity and liquidity of a security.

InΒ [80]:
def get_column_names(file_path):
    df = pd.read_csv(file_path, nrows=5)
    date_col = next((col for col in df.columns if 'date' in col.lower()), None)
    price_col = next((col for col in df.columns if 'close' in col.lower() or 'price' in col.lower()), None)
    volume_col = next((col for col in df.columns if 'volume' in col.lower()), None)
    if not all([date_col, price_col, volume_col]):
        print(f"Warning: Could not identify all required columns in {file_path}")
        print(f"Available columns: {df.columns.tolist()}")
    return date_col, price_col, volume_col

def plot_individual_trend_and_volume_graphs(file_paths, tickers):
    for ticker, file_path in zip(tickers, file_paths):
        date_column, price_column, volume_column = get_column_names(file_path)
        
        if not all([date_column, price_column, volume_column]):
            print(f"Skipping {ticker} due to missing column information")
            continue
        
        df = pd.read_csv(file_path, parse_dates=[date_column], low_memory=False)
        df = df.dropna(subset=[date_column, price_column, volume_column])
        df = df.sort_values(by=date_column)
        
        # Plot stock price, volume, and correlation
        fig = plt.figure(figsize=(18, 6), dpi=100)
        gs = fig.add_gridspec(1, 3, width_ratios=[1, 1, 1.2])
        
        ax1 = fig.add_subplot(gs[0, 0])
        ax2 = fig.add_subplot(gs[0, 1])
        ax3 = fig.add_subplot(gs[0, 2])
        
        # Plot stock price
        ax1.plot(df[date_column], df[price_column], label=f'{ticker} Price', color='#0F3D6E', linewidth=2)
        x = np.arange(len(df))
        y = df[price_column]
        slope, intercept, _, _, _ = stats.linregress(x, y)
        line = slope * x + intercept
        ax1.plot(df[date_column], line, color='#D3D04F', linestyle='--', linewidth=3, label='Trend')
        
        ax1.xaxis.set_major_locator(mdates.YearLocator())
        ax1.xaxis.set_minor_locator(mdates.MonthLocator())
        ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
        ax1.tick_params(axis='x', rotation=45, which='major', labelcolor='black')
        ax1.set_xlim(left=df[date_column].min(), right=df[date_column].max())
        ax1.grid(True, which='both', linestyle=':', linewidth=0.5)
        ax1.legend(fontsize=10, loc='upper left', labelcolor='black', prop={'size': 10})
        
        ax1.set_xlabel('Date', fontsize=14, color='black')
        ax1.set_ylabel('Price ($)', fontsize=14, color='black')
        
        fig.suptitle(f'Stock Analysis for {ticker}', fontsize=24, fontweight='bold', color='#4A249D', x=0.52)
        
        # Plot trading volume
        ax2.fill_between(df[date_column], df[volume_column], color='#32cd32', alpha=0.3, label=f'{ticker} Volume')
        ax2.plot(df[date_column], df[volume_column], color='#32cd32', linewidth=1)
        y = df[volume_column]
        slope, intercept, _, _, _ = stats.linregress(x, y)
        line = slope * x + intercept
        ax2.plot(df[date_column], line, color='#FFA41B', linestyle='--', linewidth=3, label='Trend')
        
        ax2.xaxis.set_major_locator(mdates.YearLocator())
        ax2.xaxis.set_minor_locator(mdates.MonthLocator())
        ax2.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
        ax2.tick_params(axis='x', rotation=45, which='major', labelcolor='black')
        ax2.set_xlim(left=df[date_column].min(), right=df[date_column].max())
        ax2.grid(True, which='both', linestyle=':', linewidth=0.5)
        ax2.legend(fontsize=10, loc='upper left', labelcolor='black', prop={'size': 10})
        
        ax2.set_xlabel('Date', fontsize=14, color='black')
        ax2.set_ylabel('Volume', fontsize=14, color='black')
        
        # Calculate correlation
        correlation, p_value = stats.pearsonr(df[price_column], df[volume_column])
        
        # Create a custom colormap
        colors = ["#00FFFF", "#FF00FF", "#FFFF00"]
        n_bins = 100
        cmap = LinearSegmentedColormap.from_list("custom", colors, N=n_bins)

        # Create the scatter plot
        scatter = ax3.scatter(df[price_column], df[volume_column], s=100, c=df[price_column], cmap=cmap, alpha=0.7, edgecolors='white')

        # Add a trend line
        sns.regplot(data=df, x=price_column, y=volume_column, ax=ax3, scatter=False, color='#DAFFFB', line_kws={'linewidth': 3})

        # Customize the plot
        title = ax3.set_title(f'Correlation: {correlation:.2f}', fontsize=18, color='#9CAFAA')
        title.set_path_effects([withStroke(linewidth=2, foreground='#1e90ff')])
        ax3.set_xlabel('Close Price', fontsize=14, color='black')
        ax3.set_ylabel('Volume', fontsize=14, color='black')

        # Add a colorful background gradient
        gradient = np.linspace(0, 1, 256).reshape(1, -1)
        ax3.imshow(gradient, extent=[ax3.get_xlim()[0], ax3.get_xlim()[1], ax3.get_ylim()[0], ax3.get_ylim()[1]], 
                   aspect='auto', alpha=0.3, cmap='plasma')

        # Add grid lines
        ax3.grid(color='gray', linestyle='--', linewidth=0.5, alpha=0.5)

        # Add a shiny effect
        for spine in ax3.spines.values():
            spine.set_visible(False)

        # Add glowing effect to data points
        for i in range(len(df)):
            ax3.add_artist(plt.Circle((df[price_column][i], df[volume_column][i]), 0.5, color='white', alpha=0.3))

        # Add some sparkles
        for _ in range(100):
            x = np.random.uniform(ax3.get_xlim()[0], ax3.get_xlim()[1])
            y = np.random.uniform(ax3.get_ylim()[0], ax3.get_ylim()[1])
            ax3.plot(x, y, 'w*', markersize=np.random.randint(1, 7), alpha=np.random.uniform(0.3, 1))

        # Add a colorbar
        cbar = plt.colorbar(scatter, ax=ax3)
        cbar.set_label('Close Price', fontsize=14, color='black')
        cbar.ax.tick_params(labelsize=8, colors='black')

        # Set background color for axes
        ax1.set_facecolor('#deae9f')
        ax2.set_facecolor('#deae9f')
        ax3.set_facecolor('#deae9f')

        # Set the background color for the figure
        fig.patch.set_facecolor('#f7ebe7')
        plt.tight_layout(rect=[0, 0, 1, 0.97])

        # Save figures
        plt.savefig(f'{ticker}_price_volume_correlation_analysis.svg', format='svg', bbox_inches='tight')
        plt.show()

# Our stock data
stocks = {
    "AAPL": "../P2_Stocks/AAPL/AAPL_2.csv",
    "ADBE": "../P2_Stocks/ADBE/ADBE_2.csv",
    "AMZN": "../P2_Stocks/AMZN/AMZN_2.csv",
    "MSFT": "../P2_Stocks/MSFT/MSFT_2.csv",
    "NVDA": "../P2_Stocks/NVDA/NVDA_2.csv",
    "TSLA": "../P2_Stocks/TSLA/TSLA_2.csv"
}

plot_individual_trend_and_volume_graphs(
    file_paths=list(stocks.values()),
    tickers=list(stocks.keys())
)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

To assess market and stock liquidity effectively and ensure our calculations and results are upon accurate analysis, it is crucial to consider the correlation between price and volume. A high correlation between these factors can provide valuable insights into market sentiment and liquidity. Volume serves as a strong indicator of the strength behind price movements.

ConclusionΒΆ

You may have wondered about the benefits of visualizing trading volume and how it connects with the closing price. Let's discuss some insights, which, while not universally true, are often observed.

We could conclude *some scenarios*:

Price Increase with Decreasing Volume (-ne Correlation):ΒΆ

This might be a warning sign of a potential reversal. If the price is rising but the volume is decreasing, it suggests that fewer investors are supporting the price increase, indicating potential weakness.

ADBE=> -0.09: Least negative, indicating a very weak negative relationship.

NVDA=> -0.19: Weak negative relationship.

AAPL=> -0.33: Weak to moderate negative relationship.

MSFT=> -0.34: Moderate negative relationship.

AMZN=> -0.37: Moderate negative relationship.

Almost all the supporting scenarios are not by chance, as these are well-known companies globally, trusted by many investors. _______________________

Price Decrease with Increasing Volume (-ne Correlation):ΒΆ

This indicates strong selling interest and can confirm a downtrend. High volume on price decreases suggests that the price movement is driven by significant selling pressure.

You may wonder why the price decreases in this scenario. It is due to the relationship between supply and demand. When investors have many shares they are willing to sell but there are not enough buyers, supply exceeds demand, leading to a decrease in price.

TSLA=> -0.40: Most negative, indicating a moderate negative relationship.

Non-Needed Scenarios (Feel Free to Skip Them)ΒΆ

Price Increase with Increasing Volume (+ne Correlation):ΒΆ

This indicates strong buying interest and can confirm an uptrend. High volume on price increases suggests that the price movement is supported by strong market participation.

*No one under this scenario*


Price Decrease with Decreasing Volume (+ne Correlation):ΒΆ

This can indicate a lack of conviction in the downtrend. If the price is falling but the volume is decreasing, it suggests that fewer investors are participating in the sell-off, indicating potential stabilization.

To clarify further: When a stock’s price is falling but the volume is also decreasing, it suggests that the intense selling pressure is diminishing. Fewer investors are participating in the sell-off, meaning that the downward movement is not strongly supported by a large number of sellers. As a result, the stock might be traded at a discount, and other investors may hold onto the stock in anticipation of a potential reversal or due to emotional biases.

*No one under this scenario*

InΒ [81]:
# Load the CSV files
aapl = pd.read_csv('../P2_Stocks/AAPL/AAPL_2.csv')
adbe = pd.read_csv('../P2_Stocks/ADBE/ADBE_2.csv')
amzn = pd.read_csv('../P2_Stocks/AMZN/AMZN_2.csv')
msft = pd.read_csv('../P2_Stocks/MSFT/MSFT_2.csv')
nvda = pd.read_csv('../P2_Stocks/NVDA/NVDA_2.csv')
tsla = pd.read_csv('../P2_Stocks/TSLA/TSLA_2.csv')

# Rename the columns to have consistent names
aapl_close = aapl.rename(columns={'date': 'Date', 'close': 'AAPL'})[['Date', 'AAPL']]
adbe_close = adbe.rename(columns={'Date': 'Date', 'Adj Close': 'ADBE'})[['Date', 'ADBE']]
amzn_close = amzn.rename(columns={'Date': 'Date', 'Close': 'AMZN'})[['Date', 'AMZN']]
msft_close = msft.rename(columns={'Date': 'Date', 'Adj Close': 'MSFT'})[['Date', 'MSFT']]
nvda_close = nvda.rename(columns={'Date': 'Date', 'Close': 'NVDA'})[['Date', 'NVDA']]
tsla_close = tsla.rename(columns={'Date': 'Date', 'Close': 'TSLA'})[['Date', 'TSLA']]

# Merge the dataframes on the Date column
merged_data = aapl_close.merge(adbe_close, on='Date').merge(amzn_close, on='Date').merge(msft_close, on='Date').merge(nvda_close, on='Date').merge(tsla_close, on='Date')

# Calculate the correlation matrix
correlation_matrix = merged_data.drop(columns=['Date']).corr()

# Create a heatmap for the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, linewidths=0.5, fmt=".2f")
plt.title('Correlation Matrix of Closing Prices')

# Set background color for the heatmap
plt.gca().set_facecolor('black')  # Set background color for the plot area
plt.show()

# Optional: Save the heatmap to a file
plt.savefig('correlation_matrix_heatmap.png')
No description has been provided for this image
<Figure size 640x480 with 0 Axes>

The last graph was created for hedging purposes and portfolio management. It provides insights into managing risk and optimizing the portfolio's performance by demonstrating the relationship stock prices between each Ticker.

Section 2: Q2 AnswerΒΆ

2.) What was the moving averages of the various stocks and the correlation between RSI and the Close Price over 2.51 years?ΒΆ

After analzying and identifying the trends for the stock price and its volume for each Ticker.

Now we want to do the same but with SMA,EMA and RSI to see how the stock price average trend over the Stock's life.

*What is the difference between SMA and EMA?*ΒΆ

Simple Moving Average (SMA):

Definition: SMA smooths out price data by creating an average price over a specific number of periods. It assigns equal weight to all the prices in the period. Trend Indication: A rising SMA indicates an uptrend, while a falling SMA indicates a downtrend. Lagging Indicator: Due to its equal weighting, SMA tends to lag more compared to EMA.

Exponential Moving Average (EMA):

Definition: EMA gives more weight to recent prices, making it more responsive to new information compared to the SMA.

Trend Indication: Similar to SMA, but it reacts more quickly to price changes due to its weighting formula.

Lagging Indicator: EMA lags less than SMA, making it more sensitive to recent price movements.

Importance of Analyzing SMA and EMA => By analyzing the movements of SMA and EMA, you can better understand the stock price trends over the stock's life. The SMA provides a smoother, long-term perspective, while the EMA offers a more immediate reaction to recent price changes, allowing for timely trading decisions.(But in this analysis they are all closer to each other)

Relative Strength Index (RSI):

RSI measures the speed and change of price movements, oscillating between 0 and 100:

Overbought Condition: An RSI above 70 ,this indicates that the asset has been bought aggressively, and its price may be overextended to the upside., suggesting a potential sell opportunity.

Oversold Condition: An RSI below 30 indicates oversold conditions,this indicates that the asset has been sold aggressively, and its price may be undervalued suggesting a potential buy opportunity.

Comprehensive Analysis Approach Below we will an interactin of Price,SMA_50, EMA_50, and RSI_14 together from 2022 to 2024, you can get a comprehensive view of the stock's trend, momentum, and potential reversal points.

This multi-faceted approach enhances the reliability of trading signals and helps make more informed investment decisions.

So Let's See:

InΒ [82]:
plt.style.use('ggplot')

def get_column_names(file_path):
    df = pd.read_csv(file_path, nrows=5)
    date_col = next((col for col in df.columns if 'date' in col.lower()), None)
    price_col = next((col for col in df.columns if 'close' in col.lower() or 'price' in col.lower()), None)
    volume_col = next((col for col in df.columns if 'volume' in col.lower()), None)
    if not all([date_col, price_col, volume_col]):
        print(f"Warning: Could not identify all required columns in {file_path}")
        print(f"Available columns: {df.columns.tolist()}")
    return date_col, price_col, volume_col

def load_and_preprocess_data(data_path: str) -> pd.DataFrame:
    """
    Load and preprocess the stock data from a CSV file.
    """
    data = pd.read_csv(data_path)
    date_column = 'date' if 'date' in data.columns else 'Date'
    data[date_column] = pd.to_datetime(data[date_column], format='%m/%d/%Y')
    data = data.sort_values(date_column)
    data = data.iloc[::5, :]  # Downsample data for clarity and size reduction
    return data

def plot_moving_averages(ax: plt.Axes, data: pd.DataFrame, date_column: str, close_column: str):
    """Plot SMA, EMA, and close price with trend lines."""
    z_sma = np.polyfit(data.index, data['Simple Moving Average_50'], 1)
    p_sma = np.poly1d(z_sma)
    z_ema = np.polyfit(data.index, data['Exponential Moving Average_50'], 1)
    p_ema = np.poly1d(z_ema)
    z_close = np.polyfit(data.index, data[close_column], 1)
    p_close = np.poly1d(z_close)

    ax.plot(data[date_column], data['Simple Moving Average_50'], label='SMA_50', color='#4CB9E7', linewidth=2)
    ax.plot(data[date_column], data['Exponential Moving Average_50'], label='EMA_50', color='#3E3232', linewidth=2)
    ax.plot(data[date_column], data[close_column], label='Close', color='purple', linewidth=2)
    ax.plot(data[date_column], p_sma(data.index), label='Trend Line (SMA_50)', color='green', linestyle='--', linewidth=2)
    ax.plot(data[date_column], p_ema(data.index), label='Trend Line (EMA_50)', color='orange', linestyle='--', linewidth=2)
    ax.plot(data[date_column], p_close(data.index), label='Trend Line (Close)', color='#363062', linestyle=':', linewidth=3)

    ax.set_xlabel('Date', fontsize=12, color='black')
    ax.set_ylabel('Values', fontsize=12, color='black')
    ax.legend(fontsize=10, labelcolor='black', prop={'size': 10})
    ax.grid(True)
    ax.tick_params(axis='x', rotation=45, labelsize=10, colors='black')
    ax.tick_params(axis='y', labelsize=10, colors='black')

def plot_rsi(ax: plt.Axes, data: pd.DataFrame, date_column: str):
    """Plot RSI with trend line and overbought/oversold levels."""
    z_rsi = np.polyfit(data.index, data['Relative Strength Index 14'], 1)
    p_rsi = np.poly1d(z_rsi)

    ax.plot(data[date_column], data['Relative Strength Index 14'], label='RSI_14', linestyle=':', linewidth=2, color='#7776B3')
    ax.plot(data[date_column], p_rsi(data.index), label='Trend Line (RSI_14)', color='#FF7EE2', linestyle='--', linewidth=2)
    ax.axhline(70, linestyle='-', alpha=0.7, color='red', label='Overbought (70)', linewidth=2)
    ax.axhline(30, linestyle='-', alpha=0.7, color='green', label='Oversold (30)', linewidth=2)

    ax.set_xlabel('Date', fontsize=12, color='black')
    ax.set_ylabel('RSI', fontsize=12, color='black')
    ax.legend(fontsize=10, labelcolor='black', prop={'size': 10}, loc='upper left', bbox_to_anchor=(0, 1), bbox_transform=ax.transAxes)
    ax.grid(True)
    ax.tick_params(axis='x', rotation=45, labelsize=10, colors='black')
    ax.tick_params(axis='y', labelsize=10, colors='black')

def plot_price_rsi_kde(ax: plt.Axes, data: pd.DataFrame, close_column: str):
    """Plot enhanced KDE of price and RSI with correlation information."""
    correlation, _ = stats.pearsonr(data[close_column], data['Relative Strength Index 14'])
    
    # Create a custom colormap
    cmap = LinearSegmentedColormap.from_list("custom", ["#4575B4", "#FFFFBF", "#D73027"])

    # Plot the KDE
    sns.kdeplot(
        x=data[close_column],
        y=data['Relative Strength Index 14'],
        ax=ax,
        cmap=cmap,
        fill=True,
        cbar=True,
        cbar_kws={'label': 'Density'},
        levels=20,
        alpha=0.7
    )
    
    # Add a scatter plot with low alpha for individual points
    ax.scatter(data[close_column], data['Relative Strength Index 14'], 
               color='gray', alpha=0.3, s=10, edgecolors='#DA7297', linewidth=0.5)
    
    # Add regression line
    sns.regplot(
        x=data[close_column],
        y=data['Relative Strength Index 14'],
        ax=ax,
        scatter=False,
        color='#003C43',
        line_kws={'linewidth': 2, 'linestyle': '--'}
    )
    
    # Customize the plot
    ax.set_xlabel('Close Price', fontsize=12, color='black')
    ax.set_ylabel('RSI_14', fontsize=12, color='black')
    ax.grid(True, alpha=0.3, linestyle='--', color='gray')
    
    # Add correlation text with a subtle background
    corr_color = '#1a9850' if correlation > 0 else '#d73027'
    text = ax.text(0.05, 0.95, f'Correlation: {correlation:.2f}', 
                   transform=ax.transAxes, fontsize=14, fontweight='bold',
                   verticalalignment='top', color=corr_color,
                   bbox=dict(facecolor='#F0EBE3', alpha=0.7, edgecolor='none', pad=5))
    text.set_path_effects([withStroke(linewidth=3, foreground='#F0EBE3')])

    # Adjust color bar
    cbar = ax.collections[0].colorbar
    cbar.set_label('Density', fontsize=10, color='black')
    cbar.ax.tick_params(labelsize=8, colors='black')

    # Set aspect ratio to 'auto' for better visualization
    ax.set_aspect('auto')

    return ax

def create_subplot(data_path: str, stock_name: str, axs: Tuple[plt.Axes, plt.Axes, plt.Axes]):
    """Create subplots for a single stock."""
    data = load_and_preprocess_data(data_path)
    date_column = 'date' if 'date' in data.columns else 'Date'
    close_column = 'close' if 'close' in data.columns else 'Adj Close' if 'Adj Close' in data.columns else 'Close'

    plot_moving_averages(axs[0], data, date_column, close_column)
    plot_rsi(axs[1], data, date_column)
    plot_price_rsi_kde(axs[2], data, close_column)

    for ax in axs:
        ax.set_facecolor('#a3b7ca')  # Set background color for axes

def plot_stocks_analysis(stock_paths: dict):
    """
    Plot analysis for multiple stocks.
    """
    for stock, path in stock_paths.items():
        fig, axs = plt.subplots(1, 3, figsize=(20, 6), dpi=150)
        create_subplot(path, stock, axs)
        fig.suptitle(f'Stock Analysis for {stock}', fontsize=24, fontweight='bold', color='#4A249D', x=0.52)
        fig.patch.set_facecolor('#d1dbe4')  # Set background color for figure
        plt.tight_layout(rect=[0, 0, 1, 0.97])
        plt.savefig(f'stock_analysis_{stock}.png', format='png', dpi=150, bbox_inches='tight')
        plt.show()

def main():
    """Main function to run the stock analysis."""
    stock_paths = {
        "AAPL": "../P2_Stocks/AAPL/AAPL_2.csv",
        "ADBE": "../P2_Stocks/ADBE/ADBE_2.csv",
        "AMZN": "../P2_Stocks/AMZN/AMZN_2.csv",
        "MSFT": "../P2_Stocks/MSFT/MSFT_2.csv",
        "NVDA": "../P2_Stocks/NVDA/NVDA_2.csv",
        "TSLA": "../P2_Stocks/TSLA/TSLA_2.csv"
    }

    plot_stocks_analysis(stock_paths)

if __name__ == "__main__":
    main()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

ConclusionΒΆ

Regarding the stock and its moving averages, all have surpassed their 2023 averages to date, indicating a potential upward trend in the coming years. Even TSLA suggests a possible reversal after a downward trend.

Regarding the RSI (Relative Strength Index) and its correlation with the stock price, We could conclude some Scenarios:

Both Price and RSI are Increasing (+ne correlation):

Strong Bullish Trend: When both the price of a stock and its RSI are increasing, it indicates that the stock is experiencing a robust upward movement. This is because the RSI, which measures the magnitude of recent price changes, is also moving higher, suggesting strong buying momentum.

Actions:

Continue holding or buy more shares: Given the strong bullish trend, it might be advantageous to hold onto current positions or even add more shares, expecting the uptrend to continue. Monitor for bearish divergences: Bearish divergence occurs when the price makes higher highs, but the RSI makes lower highs. This could indicate weakening momentum and a potential reversal, so it's essential to watch for this warning sign.

Let's analyze each stock given all this information:

AAPL, MSFT, TSLA, NVDA: These tickers have a moderate positive correlation (33% ,34% ,27% ,29%) between price and RSI respectively, indicating potential for higher prices in the coming years and a chance to hit the overbought line.

ADBE, AMZN: These tickers have a weak positive correlation (19% for ADBE, 21% for AMZN) between price and RSI respectively, which might indicate a reverse trend as there is no significant buying pressure on these stocks.

Non-Needed Scenarios (Feel Free to Skip Them)ΒΆ

Both Price and RSI are Decreasing (+ne correlation):

Strong Bearish Trend: When both the price and RSI are decreasing, it suggests that the stock is in a downtrend, with increasing selling pressure. Momentum Confirmation: The declining RSI indicates that the momentum is in favor of the sellers. An RSI below 30 is typically considered oversold, but in a strong downtrend, it can remain low. Actions:

Consider selling or avoiding buying: To protect against further losses, it may be prudent to sell current positions or avoid buying new shares. Monitor for bullish divergences: Bullish divergence occurs when the price makes lower lows, but the RSI makes higher lows. This could signal that the downtrend is losing steam and a reversal may be on the horizon, so it's crucial to watch for this indication.


Price is Increasing, but RSI is Decreasing (-ne correlation):

Bearish Divergence: This occurs when the stock price makes new highs, but the RSI fails to follow suit and makes lower highs. This divergence indicates that the underlying momentum driving the price increase is weakening. Potential Reversal: While the price continues to rise, the decreasing RSI suggests that the buying pressure is diminishing, which could lead to a reversal. Actions:

Be cautious about new purchases: Given the weakening momentum, it might not be the best time to buy additional shares, as the uptrend could be nearing its end. Consider taking profits or tightening stop-loss orders: Protecting gains by taking profits can be a wise strategy. Alternatively, tightening stop-loss orders can limit potential losses if the trend reverses. Watch closely for reversal signals: Indicators such as a breakdown in price support levels or bearish candlestick patterns can provide confirmation of a trend reversal.


Price is Decreasing, but RSI is Increasing (-ne correlation):

Bullish Divergence: This occurs when the stock price makes new lows, but the RSI makes higher lows. This divergence indicates that the selling momentum is weakening. Potential Reversal: While the price continues to fall, the increasing RSI suggests that the selling pressure is diminishing, which could lead to a reversal. Actions:

Consider potential buying opportunities: The weakening downward momentum might present an opportunity to buy shares at a lower price before a potential reversal. Wait for confirmation of the trend reversal before making significant trades: Ensure that there are additional signs of a trend reversal, such as a breakout above resistance levels or bullish candlestick patterns, before committing to large trades. Monitor for additional bullish signals: Look for further confirmation from other technical indicators or volume patterns to strengthen the case for a reversal.

Section 3: Q3 AnswerΒΆ

3.) What is the financial health of each company, and what are the primary metrics to focus on?ΒΆ

To analyze the financial statements correctly we have to consider main ratios to focus on and know how their changes over the period from 6/29/2019 to 3/30/2024 quarterly.

Ratios will be used:

InΒ [15]:
# Load the Excel file
file_path = r"../python/Ratios.xlsx"
xls = pd.ExcelFile(file_path)

# Load all sheets into dataframes
dfs = {sheet: pd.read_excel(file_path, sheet_name=sheet) for sheet in xls.sheet_names}

# Replace non-breaking spaces with regular spaces in column names for all sheets
for df in dfs.values():
    df.columns = [col.replace('\xa0', ' ') for col in df.columns]

# Function to extract ratios for a specific company
def extract_ratios(company, dfs):
    exclude_sheets = ['Free_Cash_Flow_Per_Share', 'Debt_Ratio', 'Free_Cash_Flow_Margin']
    ratios = {}
    for sheet, df in dfs.items():
        if sheet in exclude_sheets:
            continue
        columns = [col for col in df.columns if company in col]
        if columns:
            ratios[sheet] = df[['Quarter Ending'] + columns]
    return ratios

# Extract ratios for each company
companies = ['AAPL', 'ADBE', 'AMZN', 'MSFT', 'NVDA', 'TSLA']
company_ratios = {company: extract_ratios(company, dfs) for company in companies}

# Plot settings
def plot_ratios(ratios, company, colors):
    num_sheets = len(ratios)
    num_rows = (num_sheets + 2) // 3
    fig, axs = plt.subplots(num_rows, 3, figsize=(20, 5 * num_rows))
    fig.suptitle(f'{company} Ratios Over Time', fontsize=24, fontweight='bold', color='#4A249D', x=0.52)
    sheets = list(ratios.keys())
    
    for i, sheet in enumerate(sheets):
        ax = axs[i // 3, i % 3] if num_rows > 1 else axs[i % 3]
        df = ratios[sheet]
        for col in df.columns[1:]:
            ax.scatter(df['Quarter Ending'], df[col], label=col, color=colors[i % len(colors)], marker='o')
            
            # Calculate and plot the trend line
            x = mdates.date2num(df['Quarter Ending'])
            y = df[col]
            z = np.polyfit(x, y, 1)
            p = np.poly1d(z)
            ax.plot(df['Quarter Ending'], p(x), color='#569a8c', linestyle='--', linewidth=2)
            
            # Customize each subplot
            ax.set_title(sheet, fontsize=16)
            ax.set_xlabel('Year', fontsize=12, fontweight='bold')
            ax.set_ylabel('Value', fontsize=12, fontweight='bold')
            ax.grid(True, linestyle='--', alpha=0.7)
            
            # Make tick labels bold
            for tick in ax.get_xticklabels():
                tick.set_fontweight('bold')
            for tick in ax.get_yticklabels():
                tick.set_fontweight('bold')

            # Format x-axis
            ax.xaxis.set_major_locator(mdates.YearLocator())
            ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
            ax.tick_params(axis='x', rotation=45)
            ax.legend()
        
        # Set background color for all axes
        ax.set_facecolor('#ebd3ad')
    
    fig.patch.set_facecolor('#f7ebd8')  # Set background color for figure
    plt.tight_layout()
    plt.subplots_adjust(top=0.85)  # Adjust the top spacing for the main title
    plt.show()

# Plot ratios for each company
colors = ['#6a4c93', '#1d0e2c', '#4a3931', '#e04d01', '#104b51', '#8f53fe']
for company, ratios in company_ratios.items():
    plot_ratios(ratios, company, colors)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Section 4: Q4 AnswerΒΆ

4.) How can we predict future stock behavior for each Stock using Predicited Models?ΒΆ

*Statistical Models*:ΒΆ

Skewed Bell Cerve.

GARCH (Generalized Autoregressive Conditional Heteroskedasticity):

Useful for modeling time series with changing volatility over time.

Often used for financial data where volatility clustering is observed.

Imagine a puppy running:

Generalized Autoregressive (G):

Think about how a puppy runs around. Sometimes it runs fast, sometimes slow, but it often follows patterns. If it was running fast, it might continue running fast for a while. Conditional Heteroskedasticity (ARCH):

Imagine that the puppy runs more erratically (zig-zags) when it gets excited. Sometimes it’s calm and runs straight, other times it’s excited and zig-zags a lot. The amount of zig-zagging changes over time. Putting it all together:

GARCH is like trying to predict how much the puppy will zig-zag based on how much it zig-zagged before. If the puppy was very excited and zig-zagging a lot yesterday, it might still be excited and zig-zagging today.

*Deep Learning Model*:ΒΆ

LSTM stands for Long Short-Term Memory. It's a type of recurrent neural network (RNN) architecture that's particularly well-suited for processing and predicting time series data.

Imagine a Kid Trying to Remember a Story

The Memory Book:

Imagine you have a big memory book where you write down important things to remember.

But you don't have infinite pages, so you need to decide what to keep and what to forget.

The Helper:

You have a helper named "LSTM" who helps you remember important parts of the story and forget unimportant parts.

Reading a Story:

Every day, you read a part of a story, and your helper LSTM is there to help you.

LSTM has a few special tools: a highlighter, an eraser, and a bookmark.

Highlight Important Parts:

As you read, LSTM uses the highlighter to mark important sentences that you need to remember.

This is like when you need to remember key events in the story (important information).

Erase Unimportant Parts:

Sometimes, there are parts of the story that are not so important. LSTM uses the eraser to remove these parts from your memory.

This way, your memory book doesn’t get too full with unnecessary details.

Bookmark for Context:

LSTM also uses a bookmark to keep track of where you are in the story. This helps it understand what happened before and what might happen next.

This is like remembering the sequence of events (context).

Daily Updates:

Every day, after reading, LSTM updates the memory book by adding new highlights, erasing unnecessary parts, and moving the bookmark.

This keeps your memory fresh and focused on important details.

Using the Memory:

When you need to tell the story to someone else, you use your updated memory book.

You can recall the important parts of the story in the right order because LSTM has helped you keep your memory organized.

Connecting to LSTM in Computers:

LSTM in computers works similarly to your helper. It reads sequences of data (like parts of a story), highlights important information, forgets unimportant details, and keeps track of the order of events.

This is very useful for tasks like predicting stock prices, where it's important to remember previous trends (important events) and their order (context).

AAPL Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [91]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/AAPL/AAPL.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of AAPL Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [103]:
import pandas as pd
import matplotlib.pyplot as plt
from arch import arch_model

def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/AAPL/AAPL.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [152]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Adj Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Adj Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Adj Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Adj Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Adj Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/AAPL/AAPL.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

ADBE Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [153]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/ADBE/ADBE.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of ADBE Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [154]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/ADBE/ADBE.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [155]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Adj Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Adj Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Adj Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Adj Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Adj Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/ADBE/ADBE.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

AMZN Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [156]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/AMZN/AMZN.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of AMZN Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [157]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/AMZN/AMZN.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [158]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/AMZN/AMZN.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

MSFT Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [159]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/MSFT/MSFT.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of MSFT Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [160]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Adj Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/MSFT/MSFT.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [161]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Adj Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Adj Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Adj Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Adj Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Adj Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/MSFT/MSFT.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

NVDA Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [162]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/NVDA/NVDA.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of NVDA Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [163]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/NVDA/NVDA.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [164]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/NVDA/NVDA.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

TSLA Stock Price PredictionΒΆ

Skewed Bell CurveΒΆ

InΒ [18]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_statistics(data):
    return {
        'mean': data.mean(),
        'median': data.median(),
        'std_dev': data.std(),
        'skewness': skew(data),
        'kurtosis': kurtosis(data)
    }

def plot_skewed_distribution(data, title):
    stats = calculate_statistics(data)
    
    # Determine kurtosis description
    if stats['kurtosis'] > 0:
        kurt_desc = "Leptokurtic (> 3)"
    elif stats['kurtosis'] < 0:
        kurt_desc = "Platykurtic (< 3)"
    else:
        kurt_desc = "Mesokurtic (= 3)"
    
    # Create the plot
    plt.figure(figsize=(12, 8))
    
    # Plot histogram
    sns.histplot(data, kde=False, stat="density", bins=50, color='skyblue', alpha=0.6)
    
    # Fit and plot skewed normal distribution
    x = np.linspace(data.min(), data.max(), 1000)
    skewed_normal = skewnorm.pdf(x, stats['skewness'], loc=stats['mean'], scale=stats['std_dev'])
    plt.plot(x, skewed_normal, 'r', linewidth=2, label='Fitted Skewed Normal')
    
    # Add mean and median lines
    plt.axvline(stats['mean'], color='green', linestyle='--', linewidth=2, label='Mean')
    plt.axvline(stats['median'], color='purple', linestyle='--', linewidth=2, label='Median')
    
    # Customize the plot
    plt.title(title, fontsize=20, fontweight='bold')
    plt.xlabel('Stock Price', fontsize=14)
    plt.ylabel('Density', fontsize=14)
    plt.legend(fontsize=12)
    
    # Add text box with statistics and kurtosis description
    stats_text = "\n".join(f"{k.capitalize()}: {v:.2f}" for k, v in stats.items())
    stats_text += f"\nKurtosis: {kurt_desc}"
    plt.text(0.95, 0.95, stats_text, transform=plt.gca().transAxes, 
             verticalalignment='top', horizontalalignment='right',
             bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
             fontsize=12)
    
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

def main():
    file_path = '../P2_Stocks/TSLA/TSLA.csv'  # Update this path as needed
    data = load_data(file_path)
    plot_skewed_distribution(data, 'Distribution of TSLA Stock Prices')

if __name__ == "__main__":
    main()
No description has been provided for this image

Generalized Autoregressive Conditional Heteroskedasticity (GARCH) ModelΒΆ

InΒ [165]:
def load_data(file_path):
    data = pd.read_csv(file_path, parse_dates=['Date'], index_col='Date')
    return data['Close']

def calculate_daily_returns(data):
    daily_returns = data.pct_change().dropna()
    return daily_returns

def fit_garch_model(daily_returns):
    scaled_returns = daily_returns * 100
    model = arch_model(scaled_returns, vol='Garch', p=1, q=1, rescale=False)
    model_fit = model.fit(disp='off')
    return model_fit

def plot_garch_results(daily_returns, model_fit):
    plt.figure(figsize=(12, 6))
    plt.plot(daily_returns, label='Daily Returns', color='#698474')
    plt.plot(model_fit.conditional_volatility / 100, color='#FF4191', label='Conditional Volatility') 
    plt.title('GARCH Model Results')
    plt.xlabel('Date')
    plt.ylabel('Daily Return / Conditional Volatility')
    plt.legend()
    plt.grid(True)
    plt.show()

def main():
    file_path = '../P2_Stocks/TSLA/TSLA.csv'  # Update this path as needed
    data = load_data(file_path)
    
    daily_returns = calculate_daily_returns(data)
    
    model_fit = fit_garch_model(daily_returns)
    # print(model_fit.summary())  # Commented out to suppress the output
    
    plot_garch_results(daily_returns, model_fit)

if __name__ == "__main__":
    main()
No description has been provided for this image

180 Days Prediction (LSTM)ΒΆ

InΒ [167]:
# Load and preprocess data
def load_and_preprocess_data(file_path):
    data = pd.read_csv(file_path, index_col='Date', parse_dates=True)
    
    # Calculate MACD and Signal Line
    ema_50 = data['Exponential Moving Average_50']
    ema_100 = data['Exponential Moving Average_100']
    data['MACD'] = ema_50 - ema_100
    data['Signal_Line'] = data['MACD'].ewm(span=9, adjust=False).mean()
    
    features = ['Close', 'Simple Moving Average_50', 'Simple Moving Average_100', 
                'Exponential Moving Average_50', 'Exponential Moving Average_100', 
                'Relative Strength Index 14', 'Daily_Return', 'Volume', 'MACD', 'Signal_Line']
    data = data[features].dropna()
    
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data)
    
    return data, scaled_data, scaler

# Create dataset for LSTM
def create_dataset(dataset, time_step=60):
    X, y = [], []
    for i in range(len(dataset) - time_step):
        X.append(dataset[i:(i + time_step)])
        y.append(dataset[i + time_step, 0])  # Predicting 'Close'
    return np.array(X), np.array(y)

# Build LSTM model
def build_model(input_shape):
    inputs = Input(shape=input_shape)
    x = LSTM(64, return_sequences=True, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(inputs)
    x = Dropout(0.2)(x)
    x = LSTM(32, return_sequences=False, kernel_regularizer=l1_l2(l1=1e-5, l2=1e-4))(x)
    x = Dropout(0.2)(x)
    x = Dense(16, activation='relu')(x)
    outputs = Dense(1)(x)
    
    model = Model(inputs=inputs, outputs=outputs)
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss=Huber(), metrics=[RootMeanSquaredError(), MeanAbsolutePercentageError()])
    return model

# Train model with cross-validation
def train_model(X, y, model, n_splits=5):
    tscv = TimeSeriesSplit(n_splits=n_splits)
    histories = []
    
    for fold, (train_index, val_index) in enumerate(tscv.split(X), 1):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]
        
        early_stopping = EarlyStopping(monitor='val_loss', patience=30, restore_best_weights=True)
        model_checkpoint = ModelCheckpoint(f'best_model_fold{fold}.keras', save_best_only=True, monitor='val_loss')
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=10, min_lr=0.0001)
        
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=150,
            batch_size=32,
            callbacks=[early_stopping, model_checkpoint, reduce_lr],
            verbose=0
        )
        histories.append((fold, history))  # Append fold number along with history
    
    return histories

# Generate future predictions
def generate_future_predictions(model, last_sequence, scaler, future_steps, n_features):
    future_predictions = []
    
    for _ in range(future_steps):
        future_pred = model.predict(last_sequence.reshape(1, last_sequence.shape[0], n_features), verbose=0)
        future_predictions.append(future_pred[0, 0])
        
        new_row = np.zeros(n_features)
        new_row[0] = future_pred[0, 0]  # Close
        new_row[1] = np.mean(last_sequence[-50:, 0])  # Simple Moving Average_50
        new_row[2] = np.mean(last_sequence[-100:, 0])  # Simple Moving Average_100
        new_row[3] = (2 * future_pred[0, 0] + 49 * last_sequence[-1, 3]) / 51  # Exponential Moving Average_50
        new_row[4] = (2 * future_pred[0, 0] + 99 * last_sequence[-1, 4]) / 101  # Exponential Moving Average_100
        new_row[5] = last_sequence[-1, 5]  # Relative Strength Index_14 (simplified)
        new_row[6] = (future_pred[0, 0] - last_sequence[-1, 0]) / last_sequence[-1, 0]  # Daily_Return
        new_row[7] = last_sequence[-1, 7]  # volume (simplified)
        new_row[8] = new_row[3] - new_row[4]  # MACD
        new_row[9] = (2 * new_row[8] + 8 * last_sequence[-1, 9]) / 10  # Signal Line
        
        last_sequence = np.vstack((last_sequence[1:], new_row))
    
    future_predictions = scaler.inverse_transform(np.column_stack((future_predictions, np.zeros((len(future_predictions), n_features-1)))))[:, 0]
    return future_predictions

# Plot results
def plot_results(data, future_predictions, future_dates):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    plt.plot(data.index, data['Close'], label='Historical Price', color='blue', linewidth=2)
    plt.plot(future_dates, future_predictions, label='Future Predictions (180 Days)', color='red', linewidth=2)
    
    std_dev = np.std(future_predictions)
    plt.fill_between(future_dates, 
                     future_predictions - 2*std_dev, 
                     future_predictions + 2*std_dev, 
                     color='red', alpha=0.2, label='95% Confidence Interval')
    
    plt.title('Stock Price Prediction - Next 180 Days', fontsize=20)
    plt.xlabel('Date', fontsize=16)
    plt.ylabel('Price', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    last_historical_price = data['Close'].iloc[-1]
    final_predicted_price = future_predictions[-1]
    plt.annotate(f'Last Historical Price: ${last_historical_price:.2f}', 
                 xy=(data.index[-1], last_historical_price),
                 xytext=(10, 10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    plt.annotate(f'Final Predicted Price: ${final_predicted_price:.2f}', 
                 xy=(future_dates[-1], final_predicted_price),
                 xytext=(10, -10), textcoords='offset points', fontsize=12,
                 arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    plt.tight_layout()
    plt.show()

# Plot training history
def plot_training_history(histories):
    plt.figure(figsize=(20, 10))
    sns.set_style("whitegrid")
    
    num_folds = len(histories)
    train_colors = sns.color_palette("hsv", num_folds)
    val_colors = sns.color_palette("husl", num_folds)
    
    for (fold, history), train_color, val_color in zip(histories, train_colors, val_colors):
        plt.plot(history.history['loss'], label=f'Train Loss Fold {fold}', color=train_color, linestyle='--', linewidth=2)
        plt.plot(history.history['val_loss'], label=f'Validation Loss Fold {fold}', color=val_color, linewidth=2)
    
    plt.title('Model Training History', fontsize=20)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.legend(fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.show()

# Main execution (continued)
file_path = r'../P2_Stocks/TSLA/TSLA.csv'
data, scaled_data, scaler = load_and_preprocess_data(file_path)

time_step = 60
X, y = create_dataset(scaled_data, time_step)

model = build_model((time_step, scaled_data.shape[1]))
histories = train_model(X, y, model)

future_steps = 180  # Changed to 180 days
last_sequence = scaled_data[-time_step:]
future_predictions = generate_future_predictions(model, last_sequence, scaler, future_steps, scaled_data.shape[1])

future_dates = pd.date_range(start=data.index[-1] + pd.Timedelta(days=1), periods=future_steps, freq='B')

plot_results(data, future_predictions, future_dates)
plot_training_history(histories)
No description has been provided for this image
No description has been provided for this image

Final ConclusionΒΆ

After conducting an in-depth analysis of each ticker using correlations, financial health, and predictive models to describe relationships and forecast upcoming movements, the final conclusions are as follows:

AAPL: Sell Recommendation, as the price is expected to decrease with low volatility in the upcoming periods.

ADBE: Buy Recommendation, as the price is expected to increase with high volatility in the upcoming periods.

AMZN: Sell Recommendation, as the price is expected to decrease with low volatility in the upcoming periods.

MSFT: Sell Recommendation, as the price is expected to decrease with low volatility in the upcoming periods.

NVDA: Sell Recommendation, as the price is expected to decrease with low volatility in the upcoming periods.

TSLA: Buy Recommendation, as the price is expected to increase with moderate volatility in the upcoming periods.